feat: add support for FastSAM model with point, box and text prompts by barhanc · Pull Request #1120 · software-mansion/react-native-executorch

barhanc · 2026-05-05T14:31:20Z

Description

Adds support for FastSAM model with required postprocessing for point, box and text (using already existing CLIP export) prompts. Also adds an example app to test these.

Since FastSAM uses YOLO instance segmentation backbone with some clever postprocessing to imitate Facebook's SAM (see https://docs.ultralytics.com/models/fast-sam/#model-architecture), we use the existing instance segmentation C++ implementation and add TS postprocessing to minimize code duplication.

Introduces a breaking change?

Yes
No

Type of change

Bug fix (change which fixes an issue)
New feature (change which adds functionality)
Documentation update (improves or adds clarity to existing documentation)
Other (chores, tests, code style improvements etc.)

Tested on

iOS
Android

Testing instructions

Run the Computer Vision - Segment Anything app screen and test two available models there
You can also run Computer Vision - Instance Segmentation app screen and test the two newly added models there.
You can also run Computer Vision - Vision Camera app screen to test real-time performance of the two new models under 'Instance Segmentation'.
Check HF pages for the exported model https://huggingface.co/software-mansion/react-native-executorch-fast-sam

Screenshots

You can use following image for testing.

https://upload.wikimedia.org/wikipedia/commons/c/cd/Animal_diversity_October_2007.jpg

Simulator Screenshot - iPhone 17 Pro - 2026-05-06 at 23 12 00

Simulator Screenshot - iPhone 17 Pro - 2026-05-06 at 23 15 22

Related issues

Closes #555

Checklist

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have updated the documentation accordingly
My changes generate no new warnings

Additional notes

msluszniak

Do we want to add some benchmarks for this one?

chmjkb

I tested the demo app on iOS and the results vere pretty mid, at least for the S version. Not sure if this is the nature of the model, but just saying

msluszniak · 2026-05-08T07:20:20Z

I tested the demo app on iOS and the results vere pretty mid, at least for the S version. Not sure if this is the nature of the model, but just saying

Probably the nature of the model. You can share what results you get and I can do a cross-check.

…l integration

…ogging

…ation

… Anything

…pts.ts Co-authored-by: Mateusz Sluszniak <56299341+msluszniak@users.noreply.github.com>

barhanc · 2026-05-08T10:32:45Z

I tested the demo app on iOS and the results vere pretty mid, at least for the S version. Not sure if this is the nature of the model, but just saying

From what I've tested, the S variant is fine for simple segmentation when objects don't overlap, but for more complex scenes it's true that artifacts show up. The X variant however worked fine on all images I tried, even ones with quite complex scenes. Did you observe bad performance on X variant also?

chmjkb · 2026-05-08T13:13:27Z

I tested the demo app on iOS and the results vere pretty mid, at least for the S version. Not sure if this is the nature of the model, but just saying

From what I've tested, the S variant is fine for simple segmentation when objects don't overlap, but for more complex scenes it's true that artifacts show up. The X variant however worked fine on all images I tried, even ones with quite complex scenes. Did you observe bad performance on X variant also?

I gave some more testing to the X version, and the point/box based detections look cool, but the prompt-based ones were pretty bad. I almost never got the result I expected, but maybe its due to poor quality of text embeddings. quick example:

barhanc · 2026-05-11T10:35:56Z

@msluszniak @chmjkb I've:

updated docs by adding the section on selector use to useInstanceSegmentation.md, the previous file didn't really fit under hooks/ in my opinion.
moved the helper function bboxArea to utils/commonVision.ts as requested.
fixed keyboard handling - hopefully it feels better now.
fixed a problem with the way cropped images were passed to CLIP for embeddings - previously I followed exactly how it is done in ultralytics Python implementation where they simply pass the cropped image without masking; now the cropped image is masked based on the segmentation mask, so it should work better on examples as the one posted above. There are still some problems with text prompts on some images, but these are because of CLIP model not being able to properly embed images with certain parts of the image masked.
added optional topk parameter to selectByText that enables the user to return top-k best matching instances to the given text prompt. This is mostly for convenience and it doesn't truly solve the problem of returning multiple instances as topk must be passed explicitly, so e.g. it doesn't cover use cases where user would like to automatically count the number of objects matching certain prompt. The problem is that text/image embeddings are inherently contrastive and passing like a threshold doesn't really solve it. I could add a function that does something like an open-vocabulary classification where the user inputs text prompts that correspond to classes and for each segmented instance we return the best matching class based on cosine similarity, but I'm not sure if that's needed. What do you think?

I will be also adding benchmarks shortly.

…nent

msluszniak

LGTM from my side. We can add tip to documentation, that for images with overlaps of entities, prefer fastSam-X over smaller version.

… usage tips

chmjkb · 2026-05-12T13:13:42Z

added optional topk parameter to selectByText that enables the user to return top-k best matching instances to the given text prompt. This is mostly for convenience and it doesn't truly solve the problem of returning multiple instances as topk must be passed explicitly, so e.g. it doesn't cover use cases where user would like to automatically count the number of objects matching certain prompt. The problem is that text/image embeddings are inherently contrastive and passing like a threshold doesn't really solve it. I could add a function that does something like an open-vocabulary classification where the user inputs text prompts that correspond to classes and for each segmented instance we return the best matching class based on cosine similarity, but I'm not sure if that's needed. What do you think?

@msluszniak @chmjkb I've:

updated docs by adding the section on selector use to useInstanceSegmentation.md, the previous file didn't really fit under hooks/ in my opinion.

moved the helper function bboxArea to utils/commonVision.ts as requested.

fixed keyboard handling - hopefully it feels better now.

fixed a problem with the way cropped images were passed to CLIP for embeddings - previously I followed exactly how it is done in ultralytics Python implementation where they simply pass the cropped image without masking; now the cropped image is masked based on the segmentation mask, so it should work better on examples as the one posted above. There are still some problems with text prompts on some images, but these are because of CLIP model not being able to properly embed images with certain parts of the image masked.

added optional topk parameter to selectByText that enables the user to return top-k best matching instances to the given text prompt. This is mostly for convenience and it doesn't truly solve the problem of returning multiple instances as topk must be passed explicitly, so e.g. it doesn't cover use cases where user would like to automatically count the number of objects matching certain prompt. The problem is that text/image embeddings are inherently contrastive and passing like a threshold doesn't really solve it. I could add a function that does something like an open-vocabulary classification where the user inputs text prompts that correspond to classes and for each segmented instance we return the best matching class based on cosine similarity, but I'm not sure if that's needed. What do you think?

I will be also adding benchmarks shortly.

I think the topk should be enough, no need to over-engineer this. I'll review it now and if it looks ok then

barhanc self-assigned this May 5, 2026

barhanc added feature PRs that implement a new feature model Issues related to exporting, improving, fixing ML models labels May 5, 2026

barhanc changed the title ~~feat: add support for FastSAM model with point and box prompts~~ feat: add support for FastSAM model with point, box and text prompts May 6, 2026

barhanc marked this pull request as ready for review May 6, 2026 22:12

barhanc requested review from chmjkb and msluszniak May 6, 2026 22:45

msluszniak reviewed May 7, 2026

View reviewed changes

Comment thread packages/react-native-executorch/src/constants/modelUrls.ts Outdated

Comment thread packages/react-native-executorch/src/utils/segmentAnythingPrompts.ts Outdated

Comment thread apps/computer-vision/app/segment_anything/index.tsx

chmjkb requested changes May 8, 2026

View reviewed changes

barhanc and others added 11 commits May 8, 2026 12:21

feat: implement initial FastSAM instance segmentation screen and mode…

2c95b20

…l integration

feat: optimize FastSAM selection algorithms and improve performance l…

a896832

…ogging

refactor: move FastSAMLabel enum definition to a more appropriate loc…

fe425df

…ation

feat: add text prompts and modify the example app

fd94f6a

docs: add initial docs generated

c28dc71

feat: rename FastSAM screen and update documentation links to Segment…

52a035a

… Anything

Update packages/react-native-executorch/src/utils/segmentAnythingProm…

376aca6

…pts.ts Co-authored-by: Mateusz Sluszniak <56299341+msluszniak@users.noreply.github.com>

feat: add CoreML models

20bd4ba

fix: small fixes in segment anything example app

16d6a0a

feat: add FastSAM to vision camera

f1202a5

refactor: simplify active model selection in InstanceSegmentationTask

decff98

barhanc force-pushed the @bh/add-fast-sam branch from 252b53e to decff98 Compare May 8, 2026 10:22

fix: fix keyboard handling and layout for SegmentAnythingScreen

324e8d3

barhanc added 4 commits May 8, 2026 17:05

fix: fix cropping logic in SegmentAnythingScreen for text prompts

69ce94d

feat: add common vision utilities

8f84b17

docs: update docs

7dc462f

feat: enhance selectByText function to support multiple top matches

ee0403c

barhanc requested review from chmjkb and msluszniak May 11, 2026 10:36

fix: add pointerEvents="none" to overlay view in ImageWithMasks compo…

49563f0

…nent

msluszniak approved these changes May 11, 2026

View reviewed changes

docs: update inference time and model size documentation; add FastSAM…

80bf79d

… usage tips

chmjkb approved these changes May 12, 2026

View reviewed changes

msluszniak merged commit c0e3a83 into main May 12, 2026
5 checks passed

msluszniak deleted the @bh/add-fast-sam branch May 12, 2026 13:37

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add support for FastSAM model with point, box and text prompts#1120

feat: add support for FastSAM model with point, box and text prompts#1120
msluszniak merged 18 commits into
mainfrom
@bh/add-fast-sam

barhanc commented May 5, 2026 •

edited

Loading

Uh oh!

msluszniak left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chmjkb left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msluszniak commented May 8, 2026

Uh oh!

barhanc commented May 8, 2026 •

edited

Loading

Uh oh!

chmjkb commented May 8, 2026

Uh oh!

barhanc commented May 11, 2026

Uh oh!

msluszniak left a comment

Uh oh!

chmjkb commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

barhanc commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Introduces a breaking change?

Type of change

Tested on

Testing instructions

Screenshots

Related issues

Checklist

Additional notes

Uh oh!

msluszniak left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chmjkb left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

msluszniak commented May 8, 2026

Uh oh!

barhanc commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

chmjkb commented May 8, 2026

Uh oh!

barhanc commented May 11, 2026

Uh oh!

msluszniak left a comment

Choose a reason for hiding this comment

Uh oh!

chmjkb commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

barhanc commented May 5, 2026 •

edited

Loading

barhanc commented May 8, 2026 •

edited

Loading